38 research outputs found
Real Voice and TTS Accent Effects on Intelligibility and Comprehension for Indian Speakers of English as a Second Language
Abstract We investigate the effect of accent on comprehension of English for speakers of English as a second language in southern India. Subjects were exposed to real and TTS voices with US and several Indian accents, and were tested for intelligibility and comprehension. Performance trends indicate a measurable advantage for familiar accents, and are broken down by various demographic factors
''Fifty Shades of Bias'': Normative Ratings of Gender Bias in GPT Generated English Text
Language serves as a powerful tool for the manifestation of societal belief
systems. In doing so, it also perpetuates the prevalent biases in our society.
Gender bias is one of the most pervasive biases in our society and is seen in
online and offline discourses. With LLMs increasingly gaining human-like
fluency in text generation, gaining a nuanced understanding of the biases these
systems can generate is imperative. Prior work often treats gender bias as a
binary classification task. However, acknowledging that bias must be perceived
at a relative scale; we investigate the generation and consequent receptivity
of manual annotators to bias of varying degrees. Specifically, we create the
first dataset of GPT-generated English text with normative ratings of gender
bias. Ratings were obtained using Best--Worst Scaling -- an efficient
comparative annotation framework. Next, we systematically analyze the variation
of themes of gender biases in the observed ranking and show that
identity-attack is most closely related to gender bias. Finally, we show the
performance of existing automated models trained on related concepts on our
dataset.Comment: Camera-ready version in EMNLP 202
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Leveraging shared learning through Massively Multilingual Models,
state-of-the-art machine translation models are often able to adapt to the
paucity of data for low-resource languages. However, this performance comes at
the cost of significantly bloated models which are not practically deployable.
Knowledge Distillation is one popular technique to develop competitive,
lightweight models: In this work, we first evaluate its use to compress MT
models focusing on languages with extremely limited training data. Through our
analysis across 8 languages, we find that the variance in the performance of
the distilled models due to their dependence on priors including the amount of
synthetic data used for distillation, the student architecture, training
hyperparameters and confidence of the teacher models, makes distillation a
brittle compression mechanism. To mitigate this, we explore the use of
post-training quantization for the compression of these models. Here, we find
that while distillation provides gains across some low-resource languages,
quantization provides more consistent performance trends for the entire range
of languages, especially the lowest-resource languages in our target set.Comment: 16 Pages, 7 Figures, Accepted to WMT 2022 (Research Track
Breaking Language Barriers with a LEAP: Learning Strategies for Polyglot LLMs
Large language models (LLMs) are at the forefront of transforming numerous
domains globally. However, their inclusivity and effectiveness remain limited
for non-Latin scripts and low-resource languages. This paper tackles the
imperative challenge of enhancing the multilingual performance of LLMs,
specifically focusing on Generative models. Through systematic investigation
and evaluation of diverse languages using popular question-answering (QA)
datasets, we present novel techniques that unlock the true potential of LLMs in
a polyglot landscape. Our approach encompasses three key strategies that yield
remarkable improvements in multilingual proficiency. First, by meticulously
optimizing prompts tailored for polyglot LLMs, we unlock their latent
capabilities, resulting in substantial performance boosts across languages.
Second, we introduce a new hybrid approach that synergizes GPT generation with
multilingual embeddings and achieves significant multilingual performance
improvement on critical tasks like QA and retrieval. Finally, to further propel
the performance of polyglot LLMs, we introduce a novel learning algorithm that
dynamically selects the optimal prompt strategy, LLM model, and embeddings per
query. This dynamic adaptation maximizes the efficacy of LLMs across languages,
outperforming best static and random strategies. Our results show substantial
advancements in multilingual understanding and generation across a diverse
range of languages
Automatically identifying vocal expressions for music transcription
ABSTRACT Music transcription has many uses ranging from music information retrieval to better education tools. An important component of automated transcription is the identification and labeling of different kinds of vocal expressions such as vibrato, glides, and riffs. In Indian Classical Music such expressions are particularly important since a raga is often established and identified by the correct use of these expressions. It is not only important to classify what the expression is, but also when it starts and ends in a vocal rendition. Some examples of such expressions that are key to Indian music are Meend (vocal glides) and Andolan (very slow vibrato). In this paper, we present an algorithm for the automatic transcription and expression identification of vocal renditions with specific application to North Indian Classical Music. Using expert human annotation as the ground truth, we evaluate this algorithm and compare it with two machinelearning approaches. Our results show that we correctly identify the expressions and transcribe vocal music with 85% accuracy. As a part of this effort, we have created a corpus of 35 voice recordings, of which 12 recordings are annotated by experts. The corpus is available for download 1
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Large Language Models (LLMs) have demonstrated impressive performance on
Natural Language Processing (NLP) tasks, such as Question Answering,
Summarization, and Classification. The use of LLMs as evaluators, that can rank
or score the output of other models (usually LLMs) has become increasingly
popular, due to the limitations of current evaluation techniques including the
lack of appropriate benchmarks, metrics, cost, and access to human annotators.
While LLMs are capable of handling approximately 100 languages, the majority of
languages beyond the top 20 lack systematic evaluation across various tasks,
metrics, and benchmarks. This creates an urgent need to scale up multilingual
evaluation to ensure a precise understanding of LLM performance across diverse
languages. LLM-based evaluators seem like the perfect solution to this problem,
as they do not require human annotators, human-created references, or
benchmarks and can theoretically be used to evaluate any language covered by
the LLM. In this paper, we investigate whether LLM-based evaluators can help
scale up multilingual evaluation. Specifically, we calibrate LLM-based
evaluation against 20k human judgments of five metrics across three
text-generation tasks in eight languages. Our findings indicate that LLM-based
evaluators may exhibit bias towards higher scores and should be used with
caution and should always be calibrated with a dataset of native speaker
judgments, particularly in low-resource and non-Latin script languages
Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi
In this paper we discuss an in-progress work on the development of a speech
corpus for four low-resource Indo-Aryan languages -- Awadhi, Bhojpuri, Braj and
Magahi using the field methods of linguistic data collection. The total size of
the corpus currently stands at approximately 18 hours (approx. 4-5 hours each
language) and it is transcribed and annotated with grammatical information such
as part-of-speech tags, morphological features and Universal dependency
relationships. We discuss our methodology for data collection in these
languages, most of which was done in the middle of the COVID-19 pandemic, with
one of the aims being to generate some additional income for low-income groups
speaking these languages. In the paper, we also discuss the results of the
baseline experiments for automatic speech recognition system in these
languages.Comment: Speech for Social Good Workshop, 2022, Interspeech 202
Phone Merging for Code-switched Speech Recognition
International audienceSpeakers in multilingual communities often switch between or mix multiple languages in the same conversation. Automatic Speech Recognition (ASR) of code-switched speech faces many challenges including the influence of phones of different languages on each other. This paper shows evidence that phone sharing between languages improves the Acoustic Model performance for Hindi-English code-switched speech. We compare base-line system built with separate phones for Hindi and English with systems where the phones were manually merged based on linguistic knowledge. Encouraged by the improved ASR performance after manually merging the phones, we further investigate multiple data-driven methods to identify phones to be merged across the languages. We show detailed analysis of automatic phone merging in this language pair and the impact it has on individual phone accuracies and WER. Though the best performance gain of 1.2% WER was observed with manually merged phones, we show experimentally that the manual phone merge is not optimal